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Abstract 


We show that it is impossible to “boost” the level of fault-tolerance of a system solving 
consensus by combining less fault-tolerant components into a more fault-tolerant system. To do 
this, we consider an asynchronous distributed computing model in which a known set of processes 
interact in two ways: by using reliable point-to-point channels, and by accessing shared services. 
Each of the shared services is connected to a subset of all the processes. 

Our boosting impossibility result is: for any f > 1, the consensus problem is unsolvable in 
this model in the presence of up to f process stopping failures, if each of the shared services 
is assumed to tolerate only f — 1 process failures. This result holds regardless of the types of 
the shared services and the pattern of connectivity of processes and services. In particular, it 
is impossible to construct a protocol to solve the consensus problem for f process failures using 
any number of consensus services that tolerate f — 1 process failures. 

Interestingly, it is possible to boost the level of a system solving problems easier than con- 
sensus. For example, we show that the k-consensus problem is solvable for 2k — 1 failures using 
only (consensus) services that tolerate only 1 failure apiece. 


1 Introduction 


It is generally accepted that large distributed systems should be constructed from building blocks 
(such as middleware-provided services) that interact with each other through well-defined inter- 
faces. Large systems must also tolerate a variety of types of failures. Establishing fault-tolerance 
properties of a large system is difficult, as many scenarios have to be considered. A particularly 
desirable approach is to “boost” the level of fault-tolerance by combining less fault-tolerant compo- 
nents into a more fault-tolerant system. It is plausible that this might be achieved using techniques 
such as quorums, replication, and redundancy. 


In this paper, we demonstrate a fundamental limitation on this approach. Namely, we inves- 
tigate the possibility of fault-tolerance boosting for implementing a consensus service tolerant to 
f stopping failures from underlying “subservices” that are tolerant to f — 1 stopping failures. We 
show that, in the setting of purely asynchronous message passing, such fault-tolerance boosting 
cannot be achieved, for any type of underlying services. That is, the availability of any set of 
distributed services, each of which tolerates up to f — 1 stopping failures, is insufficient to construct 
a consensus protocol that tolerates f failures. 


In more detail, we consider a set of asynchronous processes of which f can fail by stopping, 
communicating with each other by sending messages through reliable point to point channels. In 
addition, there is a set of services through which they can communicate implicitly. A process can 
invoke operations of a service by sending a message to one of its ports, and eventually get a response 
from the service. A process can invoke multiple operations on a service, and concurrently on other 
services. But before issuing a new operation on the same service, it must first wait for a response 
to the current invocation. Each service has a fixed set of “ports” and each port is hardwired 
to one process, where it receives invocations and returns responses to the corresponding process. 
Each service has some degree of fault tolerance, say f, which represents the number of (hardwired) 
processes accessing it that could cause it to crash. This is intended to reflect the idea that services 
are implemented by distributed algorithms, which run at a number of locations, represented by 
ports. The failure really affects the location, causing not only the failure of the process hardwired 
to the corresponding port, but also the failure of that part of the distributed implementation of 
the service which resides at that location. If a sufficient number (> f) of locations of a distributed 
implementation fail, then the implementation itself will fail. Note that this idea does not in any 
way prevent the use of arbitrary oracles in the implementation of a service, e.g., such as failure 
detectors or powerful hardware concurrent objects. 


Notice that, except for the failure behavior, our services are just like the linearizable typed 
shared objects usually considered in the literature e.g. [Her91, CJT94, Jay97, LHOO0]. The services 
usually considered in the literature do not fail at all. There are only two papers we are aware of 
that consider services that can fail, [JCT98] and [AGMT95], but these papers assume the services 
are not implemented by the processes. In contrast to our model, the failures of the services and of 
the processes are not correlated in those two papers. We discuss this further in the Related Work 
section below. 


Our impossibility result says that it is impossible to build a consensus service tolerating f failures 
from services that tolerate less than f failures, independently of the number of such services, how 
powerful they are, or in what way they are accessed by the processes. Thus, for example, a strategy 
in which multiple instances of (f — 1)-fault-tolerant services are used by different subsets of the 
processes in the system, cannot work. Methods based on splitting up processes, or divide and 
conquer, also cannot work. In particular, our result holds when the underlying services include 


consensus services tolerant to f — 1 stopping failures. 


It is important to study consensus implementability because it is such a fundamental problem 
in distributed computing. In particular, there is Herlihy’s [Her91] universality result for services 
that do not fail: it is possible to design a wait-free implementation of a service of any type, shared 
by n processes, using only consensus services with n ports and registers. Our boosting impossibility 
result shows a limitation on this universality result when services can fail. 


Our impossibility holds for consensus implementability, but not for implementability of weaker 
problems. Our second result is that it is is possible to boost the level of a system solving problems 
easier than consensus, like k-consensus. In this problem processes have to agree on at most k 
different values; thus, k-consensus reduces to consensus when k = 1. We present a simple algorithm 
(generalizing the one in [HR94, HRO00]) that solves k-consensus and tolerates f failures using k’- 
consensus services that tolerate f’ less than f failures, for various values of k’ and f’. For example, 
k-consensus is solvable for 2k — 1 failures using only (consensus) services that tolerate only 1 failure 
apiece. 


Related work. Our main result is the impossibility of solving consensus f-resiliently using 
f —1-resilient services in an asynchronous system. There is a lot of work that studied the feasibility 
of implementing f-tolerant consensus as a function of the available components in the asynchronous 
system. The “components” can be simple message transmission channels or shared read/write 
registers, but also more powerful objects, perhaps implemented in hardware such as testé&set or 
implemented with timeouts such as failure detectors, or even combinations of different kinds of 
objects. A typed shared object used in many papers is what we call a service, i-e., it has (i) a 
number of ports; (ii) a set of states of the object (or values as we call them); (iii) the set of 
operations that processes may apply through its ports; (iv) the behavior of the object in terms of 
a transition relation 6, and is assumed to be linearizable. Except that the usual assumption is that 
the components themselves are reliable. 


Work that assumed that the available components are the most basic ones is [FLP85] for just 
message transmission, and [LAA87, Her91] for shared read/write registers, and proved that it is 
impossible to solve f-tolerant consensus using only these simple components. That is, the available 
components, either channels or registers never fail. Since a consensus protocol that tolerates zero 
crash faults is trivial, our result generalizes that of [FLP85], which is a special case, for f = 1. 
Indeed, our proof technique is a generalization of the one in [FLP85]. The main difference is the 
idea of modelling the services. This introduces many more scenarios to deal with in the proof. Also, 
our events are much finer grain: in FLP, in one event a process receives a message, makes a local 
state change, and also sends any finite number of messages. Our events are I/O automata actions 
in the model of distributed systems with services. So, for example, a process receiving a message 
can only make a local state change, it cannot perform any output of any kind in the same event. 


Other papers consider more general and powerful base objects (again that never fail), and 
investigate when they can be used to solve consensus. For example, [LH00] ask the question for 
f =1: Let n > 3 and S bea set of object types that can be used to solve one-resilient consensus 
among n processes. Can S always be used to solve one-resilient consensus among n — 1 processes? 
Many papers consider the other extreme, of f = n — 1 and deal with the robustness question 
posed in [Jay97]: can you combine objects of type T and T” that cannot be used to solve wait-free 
consensus each one by themselves in such a way as together solve wait-free consensus? 


Other papers relate implementations for different number of processes based on the same fault- 
tolerance level f. Specifically, |CJT94] show for all n > f > 2 and all sets S of shared object types 


(that include simple read/write registers) there is a f-resilient solution to n-process consensus using 
objects of types in S if and only if there is a f-resilient solution to (f + 1)-process consensus using 
objects of types in S. And [BGLR01] for k-set consensus: if there is a f-resilient implementation of 
n-ported f-set consensus from registers then there is a f-resilient implementation of f + 1-ported 
f-set consensus from registers. 


Thus, our question is orthogonal to the concerns of these previous works: while they assume 
reliable components, we consider components that are less reliable, i.e. we ask what problems can 
be solved in an f-resilient manner using components that tolerate less than f failures. We know 
of two papers that do consider shared objects that may fail. Afek, Greenberg, Merritt, Taubenfeld 
[AGMT95] study wait-free implementations using objects that can fail by returning the wrong value 
for a response. And more closely related to our work is [JCT98] that consider base objects that may 
fail by not responding (both [JCT98] and [AGMT95] consider other types of failures, like wrong 
values returned, less related to our work). In their model any number of processes may fail, and at 
most t base objects may fail. When an object fails, it stops responding. They have an impossibility 
result for solving consensus for two processes tolerating even one nonresponsive-faulty service, and 
even if that service can be nonresponsive wrt only one predetermined process. This proof works 
by a reduction from [LAA87]. This result is orthogonal to ours: the failures of the services in their 
model are unrelated to the failures of the processes, while in our model, services can fail only due 
to failures of processes. Thus, if no process fails, in our model we know no service will fail, while 
in such a situation in their model still services could fail. On the other hand, they know that at 
most one service will fail, while in ours there is no bound: if one service will fail due to too many 
processes failing, all the services with the same processes associated can also fail. 


Our main concern in this paper is on the implementation of consensus. Recall that Herlihy 
[Her91] has shown that any object can be implemented using consensus. Thus consensus is at the 
top of a hierarchy. As mentioned above, our impossibility result does not hold for objects weaker 
than consensus. 


The paper is organized as follows. Section 2 gives technical preliminaries. Section 3 gives our 
model of a distributed system, and defines the consensus problem. Section 4 presents our impossi- 
bility result for consensus. Section 5 describes the contrasting result for k-set consensus. Section 6 
discusses directions for further research and concludes. Appendix A presents some technical back- 
ground. 


2 Modeling Preliminaries 


2.1 Basic underlying model of concurrent computation 


We use the I/O automaton model [Lyn96, chapter 8] as our underlying model for concurrent com- 
putation. We assume the terminology of [Lyn96, chapter 8]. An I/O automaton A is deterministic 
iff, for each task t of A, and each state s of A, there is at most one transition (s,a,s’) such that 
act. 


2.2 Variable types 


We define the notion of a “variable type”, in order to describe allowable sequential behavior of 
services. The definition used here is a generalization of the one in [Lyn96, chapter 9]; the gener- 


alization allows nondeterminism in the choice of the initial state and the next state. Namely, a 
variable type T = (V, Vo, inus, resps,6) consists of: 


e V, a nonempty set of states of the variable, called values, 
e Yo CV, a nonempty set of initial values, 
e invs, a set of invocations, 


e resps, a set of responses, and 


6, a subset of (invs x V) x (resps x V) that is “total”, in the sense that, for every (a,v) € 
invs x V, there is at least one (b,v’) € resps x V such that ((a,v), (b,v’)) € 6. 


A deterministic variable type is one in which 6 is a mapping, i.e., for every (a,v) € inus x V, 
there is exactly one (b,v’) € resps x V such that ((a,v), (b,v’)) € 6. 


The reason for generalizing the notion of a variable type to allow nondeterminism is that we 
want to make our notion of “service”, defined below, as general as possible. In particular, we want 
to include the problem of k-consensus, which can be specified using a nondeterministic variable 
type, in our consideration. 


Example. Read/write variable type: Here, V is some arbitrary set of “values,” Vo = V, 
inus = {read} U {write(v) : v € V}, resps = V U {ack}, and 0 is defined to include the following 
pairs: ((read,v), (v,v)) for v € V, and ((write(v), v’), (ack, v)) for v,v' EV. 


Example. Consensus variable type: Here, V is the set of subsets of {0,1} having at most one 
element, Vo = 0, invs = {init(v) : v € {0, 1}}, resps = {decide(v) : v € {0,1}}, and 6 is defined to 
include the following pairs: 

((init(v), 0), (decide(v), {v})) for v € V, and ((init(v), {v'}), (decide(v'), {v'})) for v,v' EV. 


Example.  k-consensus variable type: Here, V is the set of subsets of {0,1,...,4} having 
at most k elements, Vo = 0, invs = {init(v) : v € {0,1}}, resps = {decide(v) : v € {0,1}}, 
and 0 is defined to include the following pairs: ((init(v),W), (decide(v'),W U {v})) for |W| < k, 
vo’ € W U {ov}, and ((init(v), W), (decide(v'), W) for |W| =k, v' € W. 
Thus, the first & values get remembered, and all operations return one of these first k values. 


2.3. Canonical f-fault-tolerant atomic objects 


We now define the notion of canonical f-fault-tolerant atomic object, which describes the allowable 
concurrent behavior of services. The canonical f -fault-tolerant atomic object of type T for endpoint 
set J and with index k is given in Figure 1 as an I/O automaton that is parameterized by k, T, J, 
and f, where these are: 


1. A unique index k, drawn from some index set K, 


2. An underlying variable type T = (V, Vo, inus, resps,6), which defines the sequential behavior 
of the object, 


3. A set of “endpoints” J, and 


4. The required degree of fault-tolerance f. 


A canonical atomic object accommodates concurrent invocations by different processes, i.e., 
between an invocation from and response to a particular process, the invocations of other processes 
may arrive and be processed. The use of a set of endpoints allows different services to be connected 
to different sets of processes. Thus, J will be a subset of some set J of process indices, which 
represents all the processes in the system. 


Our notion of atomic object generalizes that in [Lyn96, section 13.1.2]. We note the follow- 
ing features of our atomic objects. Each process in J can issue any invocation of the atomic 
object’s underlying variable type, and can (potentially) receive any allowable response. The re- 
sult of performing an particular operation is nondeterministically selected from all results allowed 
by the transition relation 6 and the current value val of the object. Thus, the object is, in gen- 
eral, inherently nondeterministic in that it can exhibit nondeterminism that is not just due to the 
nondeterminism of its invocations by different processes. 


For every process P;, 1 € J, there corresponds a task of the atomic object, which we call an 
i-task. The i-task consists of all the perform actions that carry out the operations invoked by 
P;, together with all the possible response actions giving responses to P;. In addition, the i-task 
contains a dummy, ; action, which is enabled when either P; has failed or more than f processes 
in J have failed. Thus, by inspecting Figure 1 we see that for every 1 € J, the task structure 
requires that the object eventually respond to an outstanding invocation by P;, unless either P; 
has failed or more than f processes in J have failed. In the latter case, the object is allowed to 
abstain from responding to Pj, since the internal action dummy, ; is enabled, and can be executed 
to discharge the fairness requirement imposed by the task structure. If more than f processes have 
failed, then the object is allowed to abstain from responding to any process in J, since dummy, ; is 
enabled for alli € J. This reflects the idea that the object is f-tolerant; once more than f failures 
have occurred (amongst processes connected to the object), then the object can itself “fail” by 
being “silent” forever from that point onwards. That is, we allow the object to violate its liveness 
property. Note, however, that the object can never violate its safety property, e.g., by returning 
values inconsistent with the transition relation 6. Note that we also allow the object to be silent if 
all processes it is connected to (ie., in J) fail, since dummy, ; is then enabled for all i € J. 


2.4 f-fault-tolerant atomic objects 


Given a variable type 7, and set J;, of endpoints, define an I/O automaton U to be a well-formed 
environment for J, and J; if and only if 


1. Its outputs are exactly the invocations of 7; at the endpoints in J;,, and its inputs are exactly 
the responses of 7; at the endpoints in Jz, and 


2. In every execution of U, for each endpoint 7 € Jz, there aren’t two consecutive invocations at 
2 without an intervening response at 7. 


An I/O automaton A (a full-blown I/O automaton, with tasks) is said to be an f -fault-tolerant 
atomic object of type Tz, set J, of endpoints, and index k, if and only if it implements the f-fault- 
tolerant canonical atomic object 5S; of type 7; for J;,, in the following sense: 


1. It has the same input and output actions (including the fail actions). 


2. If U is a well-formed environment for 7; and J, then 


Canonical Atomic-Object(k, (V, Vo, inus, resps, 6), J, f) 


Signature 


Input: 
i,k, @ E invs, the invocations of Atomic-Object(k, (V, Vo, inus, resps, 6), J, f) by Pi, i € J 
foil,,i€ J 
Output: 
bri, b € resps, the responses of Atomic-Object(k, (V, Vo, inus, resps,6), J, f) to Pi, i € J 
Internal: 
perform((a,v), (b,v'))x,i, @ € inus, b € resps, v,v' EV, iE J 
dummy, ;, 0 € J 


State 


val, a value in V, initially a value in Vo 

inv — buffer, a set of pairs (7,a), for a; an input action 
resp — buffer, a set of pairs (i, 6), for 6; an output action 
failed C J, initially empty 


Actions 
Input a;,, Output b, ; 
Eff: inv —buffer <— inu— buffer U {(2, a)} Pre: {(i,b)} € resp — buffer 


Eff: resp—buffer < resp— buffer 


Internal perform((a, v), (6, v')) x,i 


Pre: (i,a) € inv—buffer A val = v A 6((a, v), (6, v')) Input fail, 
Eff: inv —buffer <— inv—buffer — {(t,a)}; Eff: failed «+ failed U {i} 
val + vu’; i 
resp — buffer < resp — buffer U {(i, b)} Internal dummy, ; 
Pre: i € failed V |fatled| > f 
Eff: none 
Tasks 


For every i € J: {perform((a, v), (b, v'))x,s : 6((a, v), (b, v'))} U {bi + 6 € resps} U {dummy,, ;} 


{(é, 6)}; 


Figure 1: I/O automaton for the canonical f-fault-tolerant atomic object with endpoints J and 


type T = (V, Vo, inus, resps, 0) 


(a) Any trace 8 of A x U is also a trace of S, x U. (This should imply that A preserves 


well-formedness and guarantees atomicity.) 


(b) Any fair trace GB of A x U is also a fair trace of 5; x U. (This should imply that the 


implementation is f-fault-tolerant.) 


3 Model of Computation 


The model we consider for our problem consists of a collection of processes, channels, and services, 


which we define formally below. For the rest of this section, we fix: 


e I, K, finite index sets, and 


e 7, a variable type for the entire system, representing the problem being solved, and 


6 


a; b; receive(m) C.. send(m) i. 
dst 


fail, 


receive(m);,; 


Figure 2: The interfaces of process P;, channels C;,;,Cj; and service S;, in the complete system. 


e M, a message alphabet. 


A distributed system with services (DSS) for I, K,7, M is the parallel composition of I/O automata 
(see [Lyn96, chapter 8]) of the following kinds: 


1. processes P;,i € I, and 
2. channels Cij, 7,7 € 1,147, and 


3. services S;, k © K. We let 7; denote the variable type and J; C I denote the set of endpoints 
of service Sp. 


Processes interact only via channels: Process P; communicates with process P; over unidirectional 
channel C;,;. Processes also interact with services: Process P; can invoke service S, provided that 
7 is in S;’s set of endpoints. Services do not communicate directly with one another; however, they 
interact indirectly via common processes. Figure 2 shows the interfaces that a process, channel, 
and service have. In the remainder of this section, we provide more details about the components. 


3.1 Processes 
Process P;, 1 € I has the following kinds of inputs and outputs: 


1. Inputs a; and outputs b;, where a is an invocation of type 7 and b is a response of type T. 
These represent P,’s interactions with its own clients (the outside world). 


2. Outputs send(m);,; and inputs receive(m);;, m € M, which connect to channels Cj,; and 
C4, respectively. 


3. For every service S; such that 7 € J,, outputs a;, where a is an invocation of type 7;, and 
inputs b,;, where b is a response of type Tx. 


t 


4. Input fail;. 


We assume that P, observes well-formedness for each separate service S;: it does not issue two 
invocations on 5; without receiving a response to the first one. However, P; is allowed to issue an 
invocation on a service without waiting for previous invocations on other services to respond. That 
is, P; can issue concurrent invocations to different services, but not to the same service. We also 
assume that the client of P; is well-formed with respect to P;: it does not issue two invocations to 
P, without receiving a response to the first one. We assume that P; has only a single task, which 
therefore consists of all the locally-controlled actions of P;. We assume that in every state, some 
action in that single task is enabled. We assume that the fail; input action sends P; into some 
kind of state from which (from that point onward), no output actions are enabled. However, other 
locally-controlled actions may be enabled—in fact, by the restriction just above, some such action 
must be enabled. This action might be a “dummy” action, as in the fault-tolerant atomic objects 
defined earlier. 


3.2 Services 


We define a f-fault-tolerant service of a particular variable type 7; for a particular set J; of 
endpoints, to be simply the canonical f-fault-tolerant atomic object of type 7; for J,. Let Ty.invs, 
T,.resps denote the set of invocations, responses, respectively, of the variable type Ty. 


The safety properties of a service S; are determined by its finite traces, which are determined 
by its start states, transitions, and signature. These are all part of the definition of the service as an 
I/O automaton. Likewise, the liveness properties of a service S; are determined by the automaton 
task structure and the usual conventions for fair executions of I/O automata. 


We say that P; has an outstanding invocation to a service Sx iff either (1) the invocation buffer 
of S;, contains an invocation of the form (i,a), a € Ty.invs, or (2) the response buffer of S;, contains 
a response of the form (7,6), b € Ty-resps. 


We say that a service S; is silent along an execution a iff the only actions that S; executes 
along a are dummy actions. 


3.3. Channels 


Channel C;,; is a FIFO reliable channel, as defined in [Lyn96, chapter 14]. Its inputs are send(m);,; 
actions, which are outputs of P;, and its outputs are receive(m);,; actions, which are inputs of P;. 
A channel has exactly one task, consisting of its locally controlled actions. 


3.4 The task structure of a complete system 


The ordinary assumptions about I/O automata mean that the system executes using a “weakly 
fair” scheduling discipline: in any execution, every task that is continuously enabled gets selected 
for execution infinitely often. (Thus, an enabled task is eventually either disabled or executed.) 
For a service S,, there is a task for each i € J;,, consisting of the actions {perform((a, v), (b,v'))aa : 
d((a,v), (6, v'))fU{b; : b € resps} U{dummy, ;}, see Figure 1. For a process P; there is a single task, 
consisting of all the locally controlled actions of P;. Likewise, for a channel C;,;, there is a single 
task, consisting of all the locally controlled actions of Cj,;, i.e., the receive(m);,; actions, m € M. 


Since a task of a component contains only its locally controlled actions, we infer from the 
signature compatibility condition for I/O automata that the tasks define a partition of the set of 
all actions in the system, except the init(v); and fail; actions; each action occurs in exactly one 
task. 


With this task structure, the weak fairness discipline implies that every message that is sent 
is eventually received, every process executes infinitely often along an infinite fair execution, and 
every outstanding invocation (of a service) eventually receives a response. 


We introduce a naming scheme for tasks as follows. The single task of P;, 7 € I is called pt;. The 
single task of channel Ci,;, 7,7 € [1,1 4, is called ct;,;. The task of service S;, k € K for 7 € Jy is 
called st, ;. We define PT = {pt;: 1 € I}, CT = {cty; 11,9 € 1,t Aj}, ST = {sth kk € Ki € Jr}, 
and T = PT UCT UST. We call the tasks pt; (i € I) process tasks, the tasks ct;,; (4,7 € 1,1 4 j) 
channel tasks, and the tasks st, ; (k € K,i € Jy) service tasks. 


For any action a except an init(v); or fail;, we define task(a) to be the unique ¢ such that t € T 
and a € t, i.e., task(a) is the name of the task containing a. We define task(init(v);) = init(v),, 
and task (fail;) = fail;, i.e., we consider these actions as being the sole members of singleton tasks, 
and overload the name of the action as the name of the corresponding task. If e is a channel task 
ct;,;, then let receiver(e) be the process P;. 


3.5 The Consensus problem 


The “traditional” specification of f-fault-tolerant consensus is given in terms of a set {P,,i € I} 
(I is an index set) of processes that each starts with some value v; drawn from {0,1}. Processes 
are subject to crash failures [Sch90], that disable the process from producing any output.' As a 
result of engaging in a consensus algorithm, each nonfaulty process eventually “decides” on a value 
from {0,1}. The behavior of processes is required to satisfy the following three conditions [Lyn96, 
chapter 6]: 


Agreement No two processes decide on different values. 
Validity The value decided on is the initial value of some process. 


Termination In every infinite fair execution, all nonfaulty processes eventually decide. 


We specify the consensus problem in a slightly different way. We say that a DSS S solves f-fault- 
tolerant consensus for I if and only if S is an f-fault-tolerant atomic object of type consensus 
(Section 2.2) for endpoint set I. 


We now show that any system that meets our definition also meets the traditional one. We 
argue that the f-fault-tolerant canonical consensus object for endpoint set J satisfies the three 
conditions above (with a slight variation of the termination condition). 


From the definition of the consensus variable type, each process in I has two invocations, init(0), 
init(1) and two responses, decide(0), decide(1). By inspecting the consensus variable type given in 
Section 2.2, we see that the value of the variable is initially 0, and on invocation init(0) can change 
from @ to {0}, and on invocation init(1) can change from () to {1}, and is stable once it is different 
from (. It is also clear that any decide(0) response is only issued by the object when the variable 


‘Crash failures are usually defined as disabling the process from executing at all. However, the two definitions are 
equivalent with respect to overall system behavior. 


has value {0}, and any decide(1) response is only issued by the object when the variable has value 
{1}. Hence, after the first decide(0) response, all subsequent responses will be decide(0), and after 
the first decide(1) response, all subsequent responses will be decide(1). So, the canonical consensus 
object satisfies the agreement condition. If all invocations are init(0), then the only possible change 
of the variable is from @ to {0}. Hence, all responses will be decide(0). Likewise if all invocations 
are init(1), then all responses will be decide(1). Otherwise, there are both init(0) and init(1) 
invocations. Hence, in all cases, the value decided on is the value occurring in some invocation. 
Hence, the canonical consensus object satisfies the validity condition. If at least one process invokes 
the f-fault-tolerant canonical consensus object, then the value of the variable will eventually be 
either {0} or {1}, provided that less than f processes fail, and that the scheduling is weakly fair, as 
discussed in Section 3.4. Hence, all nonfaulty processes that invoke the object will receive a decide 
response, along fair executions in which no more than f processes fail. Processes that do not invoke 
the object will not receive a response, even if they are nonfaulty. That is, processes that do not 
invoke the object (with an init(v) action) do not participate in the consensus algorithm, and hence 
are not required to have an initial value. This is a slightly different condition than the traditional 
termination condition, which requires that all nonfaulty processes do have an initial value, and that 
they all eventually decide. Here, only the nonfaulty processes that “participate,” by invoking the 
object, will receive a decision. 


Since any system S that solves solves f-fault-tolerant consensus for J can only exhibit behaviors 
(in composition with a well-formed environment) that are a subset of the behaviours of the f-fault- 
tolerant canonical consensus object, the desired conclusion follows. 


4 The Impossibility Result 


The problem we address is to design a system, as given in Section 3, which is an f-fault-tolerant 
atomic object (Section 2.4) of type consensus for some (arbitrary) set J of endpoints. We show 
that, when the services in the system are restricted to be (f — 1)-fault-tolerant atomic objects, 
that this problem is impossible to solve. The services can have arbitrary types, and can have as 
endpoints any subset of J. Thus, techniques based on quorums, replication, and redundancy, could 
all be implemented within our model. Our result implies that none of these approaches would help: 
a limitation on the fault-tolerance of the underlying services is also a fundamental limitation on 
the fault-tolerance of any consensus service that can be built from these underlying services. 


Since we now restrict attention to systems that are consensus objects, the inputs a; and outputs 
b; that represent P,’s interactions with its own clients are now instantiated as the inputs init(0);, 
init(1);, and the outputs decide(0);, decide(1);, for the single consensus client that P; now interacts 
with. 


4.1 Main result and proof assumptions 


The main result of the paper is: 
Theorem 1 Let I be an arbitrary endpoint set such that |I| > 2, and let f be such that1 < f < 


|I|. Then there does not exist a distributed system with services that is an f-fault-tolerant atomic 
consensus object for endpoint set I, if the services are (f — 1)-fault-tolerant. 
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Note that the services can be of any variable type. We assume in the sequel, that such a DSS, P, 
exists and derive a contradiction. 


We assume that all the processes of P are deterministic automata, as defined in Section 2.1. 
Since channels are FIFO, they are already deterministic. We assume a slightly weaker condition 
for services, namely that variable type of each service is deterministic, i.e, the relation 6 of the 
underlying variable type is a mapping. For an impossibility proof, these assumptions are made 
without loss of generality, since processes and services can be made to satisfy the above conditions 
by removing a subset of the locally-controlled transitions. Hence, if an unrestricted solution exists, 
then a solution satisfying our assumptions also exists. 


4.2 Terminology used in the proof 
4.2.1. Transitions 


A transition is a triple (s,a,s’). We define first(s,a,s’) = s, action(s,a,s’) =a, last(s,a,s') = s’. 
The participants of a locally controlled action (i.e., not an init(v); or fail; action) a of the system 
are all automata with a in their signature: participants(a) = {A | a € acts(A)}. The participants 
of a transition (s,a,s’) are the participants of its action: participants (s,a,s’) = participants (a). 


If the action a of a transition is an output action of some component A (process or service, since 
channels do not have internal actions), then we say that the transition is an output transition of 
A. We define internal transition of A similarly. Due to I/O automaton signature compatibility, a 
transition can be the output or internal transition of at most one component. Furthermore, due to 
the structure of the system, as given in Section 3, every transition, with the exception of transitions 
due to the execution of the init(v); inputs to P;, and fail; actions, is either an output transition or 
an internal transition of exactly one component. 


4.2.2 Tasks and scheduling 


We say that a task e is applicable to a global state s iff some action of e is enabled in state s. If 
a is a finite execution, then we say that e is applicable to a iff e is applicable to last(a). Thus, 
if e is an applicable channel task ct;,;, then the corresponding channel C;,; must be nonempty, so 
that a message can actually be delivered. If e is an applicable service task st; ;, then either the 
invocation buffer of service S; must contain an invocation from process P;, or the response buffer 
of S;, must contain a response to P;, or the dummy, ; action must be enabled. We assume, for 
technical convenience, that a process always has an enabled locally controlled action, and so a 
process task is always applicable. 


An applicable task e, together with the current global state, determines a unique transition 
(arising from the scheduling of task e in the current state) since processes and channels are de- 
terministic, and the variable type underlying a service is also deterministic. We denote this tran- 
sition as transition(e,s). Let transition(e,s) = (s,a,s’). Then, we apply the notation defined in 
Section 4.2.1 to transition(e,s) as follows: first(e,s) = s, action(e,s) = a, last(e,s) = s’. We 
abbreviate last(e,s) by e(s). We note that transition(e,s), first(e,s), action(e,s), last(e,s) are 
defined if and only if e is applicable to s. 


We note that when e is a channel task, then transition(e,s) always causes a change of state, 
ie., e(s) A 8, since some message is delivered by the channel. When e is a service task st, ;, then 
transition(e, s) causes a change of state unless it corresponds to the execution of a dummy, action. 
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When e is a process task, then transition(e, s) may or may not cause a state change. This would 
depend on the transition structure of the process, about which we make no assumptions. 


4.2.3. Executions 


Define an initialization of P to be a finite execution containing exactly |I| actions, which moreover 
are all init(v;); actions, one for each i € I. Define an execution a of P to be input-first iff it has an 
initialization as a prefix, and otherwise contains no init actions. If a is a finite execution, then an 
extension of @ is an execution a’ such that a is a prefix of a’. Define a finite input-first failure-free 
execution a to be 0-valent if (1) some input-first failure-free extension of a contains a decide(0); 
action, for at least one i € J, and (2) no input-first failure-free extension of a contains a decide(1); 
action, for any 7 € I. The definition of /-valent is analogous. Define a finite failure-free execution 
a to be univalent iff it is either 0-valent or 1-valent. Define a finite input-first failure-free execution 
a to be bivalent iff it has some input-first failure-free extension that contains a decide(0); action, 
for at least one 7 € J, and some input-first failure-free extension that contains a decide(1); action, 
for at least one 7 € I. 


Since the assumed f-fault-tolerant atomic consensus object P is an I/O automaton, we can 
view its transition relation as defining a labeled directed graph whose nodes are the states of P and 
which contains a directed edge from s to s’ labeled with a iff (s,a,s’) is in the transition relation 
of P. This graph is called the global state transition graph of P. Let G(P) be the subgraph of 
the global state transition graph of P obtained as follows: (1) include every state that lies along 
an input-first execution, and (2) include all the transitions of P that connect the states that are 
included by virtue of (1). 


4.2.4 Schedules 


A schedule is a finite sequence of task names drawn from T U {init(v);, fail; : v € {0,1},7 € I}. 
Let o = e€1€2...e€n be a schedule, and s be a global state, such that, e; is applicable to s, eg is 
applicable to e1(s), and, generally, e; is applicable to e;_1(e;-2(... (e1(s))...)) for alli, 1<i<n. 
Then, we say that o is applicable to s, and we let o(s) denote en (en_1(...(e1(s))...)). A schedule 
o is applicable to a finite execution a iff o is applicable to last(a). In this case, we let o(a) denote 
the resulting extension of a. 


Let a = 8901810959...8;-1a;5; be a finite execution. Then, we define the schedule 
schedule(a) = task(a,)task(a2)...task(a;). That is, for each action in a, we take the name of 
the task containing the action. schedule(a) then consists of these task names in the same order as 
their corresponding actions. 


4.3. The proof 


Our proof will build up a series of lemmas establishing certain constraints on G(P). We start with 
the basic commutativity situation illustrated in Figure 3. 


Lemma 2 Let s be any global state of the f-fault-tolerant atomic consensus object P, and let e1, 
e2 be tasks such that 


1. e1, €2 are both applicable to s, and 
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Ss] $2 


€9(51), €1(S2) 


Figure 3: Commuting tasks w.r.t. a state s. 


2. participants (e1, 8) N participants (e2,s) = 0. 


Let e1(s) = 81, and e9(s) = sg. Then, eg is applicable to 51, and e, is applicable to s2, and 
€2(81) = €1(S2). 


Proof. By assumption (e;,s) and (e2,s) only affect the state of different components. It fol- 
lows that e2 is applicable to s,, and that e; is applicable to s2. By determinism, it follows that 
participants (e1, 8) = participants (e,, 82), and that (e1,s) and (e1,s2) are the same transition “lo- 
cally,” i.e, they effect exactly the same state changes in the components in participants(e1, s). 
Likewise for (e2,s) and (e2, 1). Thus, the accumulated state changes of (e,, 5) followed by (eg, 1) 
are the same as the accumulated state changes of (e2,s) followed by (e1,82). Hence the lemma 
holds. Figure 3 illustrates the proof. 


Lemma 3 The f-fault-tolerant atomic consensus object P must have a bivalent initialization. 


Proof. Recall that we assume f > 1 (Section 4.1). The argument is then exactly the same as that 
in the proof of Lemma 12.3 in [Lyn96, chapter 12]. 


Suppose there exists a finite input-first failure-free execution a,, and states s, s’, 8”, 89, $1, 
and tasks e,e’ which are related as given by Figure 4. We call such a configuration a hook, after 
[CHT96]. We say that the hook starts in state s, and we call a, the stem of the hook. We also 
admit as a hook a configuration in which the 0-valent and 1-valent states are interchanged. 


Lemma 4 Let a, be a finite input-first failure-free bivalent execution of G(P), and let first(a,) = 
Sstart, last(as) = s. Let e be a task of P applicable to a,. Let 
U = {ay | A =o(as), o is a finite failure-free schedule applicable to a; and not containing e}, 
V = {e(ax) | a € U and e is applicable to ay}. 
Then either (1) V contains a bivalent execution, or (2) G(P) contains a subgraph which is a hook 
starting in Sstart, as given by Figure 4. 
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Sq (0-valent) 


$1 (1-valent) 
Figure 4: A hook starting in s. 


Proof. We assume both the antecedent of the lemma and the negation of (1), and establish (2). 


Now e is either a channel task, process task, or service task. If e is a channel task ct;,;, then 
applicability of e to s means that channel C;,; contains a message in state s. Thus, e is also ap- 
plicable to any state reached from s by a schedule not containing e, since the message remains in 
Ci; as long as ct;,; is not scheduled. If e is a process task, then e is applicable to any state, by 
our assumption that a process always has some enabled locally controlled action. If e is a service 
task st, ;, then applicability of e to s means that either service S,; has a pending invocation from 
process P; in state s, or dummy, is enabled. Thus, e is also applicable to any state reached from 
s by a schedule not containing e, since the invocation (if present) remains pending as long as st;,; 
is not scheduled, and dummy, , remains enabled once it is enabled. We have therefore shown, 


e is applicable to every execution in U. (a) 


Since a, is bivalent, there exists a 0-valent extension a,, of as and a 1-valent extension a,, of 
as. For 7 € {0,1}, we argue as follows. 


CASE 1: az, € U. Let ay, = e(az,). Hence ay, is i-valent, since az, is i-valent. Also, ay, € V, 
since a, € U. 


CASE 2: a,, ¢ U. Then, e was applied in extending a, to az,. Let ay, be the unique extension 
of as whose last action has task e. a, is unique due to our assumptions in Section 4.1 about the 
deterministic behavior of processes and variable types. Hence ay, = e(a,) for some extension a’, 
of as. Hence ay, € V by definition of V. Since (1) is false by assumption, V contains no bivalent 
executions. Hence ay, is univalent. But az, is 7-valent and is an extension of a,,. Hence ay, is 
i-valent. 
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Uj (Wm) 


Figure 5: Existence of the hook. 


Thus, in both cases, we have that ay, € V and ay, is t-valent. Moreover, this holds for both 
4=O0andi=1. Thus 


there exist 0-valent a,, € V and 1-valent ay, € V (b) 


Let ay = e(a,;), and let v = last(ay). Hence ay € V, and so ay is univalent by the assumption that 
(1) is false. Without loss of generality, let aw, be 0-valent. By (b), there exists ay, € V which is 
1-valent. Let a, be an execution in U such that e(um) = ay,, and let um = last (ay,,,). Hence, we 
have the situation depicted in Figure 5, since a,,,, is an extension of a;. (The state s is the same 
state in Figures 4 and 5). Consider the (unique) execution fragment y such that ay,,, = apy. By 
(a), e is applicable to every state along 7. Since the resulting executions are all in V by definition, 
they are all univalent, by assumption. Since a, is 0-valent and ay, is 1-valent, it follows that there 
exist two such executions, ag and a; such that ag is 0-valent, a; is 1-valent, and ao, a; result from 
applying e to adjacent states along y. The subgraph of G(P) generated by taking the “union” of ao 
and a (i.e., take all states and transitions occuring in one, or both, of ao, a ) is then the desired 
hook. 


Lemma 5 G(P) does not contain as a subgraph a hook whose stem is a finite input-first failure-free 
execution. 


Proof. Our proof is by contradiction. We assume that G(P) does contain such a hook, and establish 
that P is not a f-fault-tolerant atomic consensus object, contrary to assumption. 


Without loss of generality, we assume the configuration in Figure 4. For each state except Sstart, 
we let a subscripted with the state name denote the unique finite execution which is contained in 
the hook and which ends in that state: a, is the stem of the hook, a,, ends in sg, as, ends in 81, 
and a, ends in s”. 


We remark that a, cannot contain any decide actions, since it is bivalent, and this would 
otherwise violate the agreement property. We first establish Claims 1-3. 
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Claim 1: e# e'. 
Suppose not. Then, by determinism (Section 4.1), we have so = s”. Now s1 is reachable from s”, 
and s, is 1-valent. Hence, s” is either bivalent or 1-valent. sg however, is 0-valent. Hence we have 
a contradiction. So, claim 1 is established. 


Claim 2: |participants(e, s’)| < 2, |participants(e’, s’)| < 2. 
From the structure of a DSS (Section 3), we see that every output action of some component is an 
input action of at most one other component. The claim follows. 


Claim 3: |participants(e, s’) N participants (e’, s')| < 1. 
From Claim 2, we immediately have that |participants(e, s') N participants (e’, s’)| < 2. Suppose 
|participants(e, s’) N participants (e’, s’)| = 2. From Claim 1, we know that e 4 e’. Hence, it must 
be that, for some distinct components C1, C2, action(e,s’) is an output action of C; and an input 
action of C2, action(e’, s’) is an input action of C, and an output action of C2. Since services and 
channels have no actions in common, the only possibilities for this are: 


e {C,, C2} = {P;, S,} for some P;, Sx. 
This violates well-formedness of P; for S,. 


e {C),Co} = {Pj, Cj} for some P;, C;,;. 
No output action of Cj; is an input action of F;. 


e {C),Co} = {F;, Chi} for some P;, C;,. 
No output action of P; is an input action of C;;. 


Since all three cases lead to a contradiction, the claim is established. 


From Claim 3, we have four possibilities for participants (e, s')M participants (e’, s’). To complete 
the proof of the lemma, we consider each separately. 


CASE 1: participants(e, s') N participants (e’, s’) =. Hence, the antecedent of Lemma 2 holds 
for s = s', ey =e, and eg = e’. Hence, e’ is applicable to so, and e’(s9) = 51. Hence, e/(as,) and 
as, have at least one infinite fair extension with a common suffix. Since a, does not contain any 
decide actions, it follows that the suffix must contain decide actions. Now ag, is 0-valent and ag, 
is 1-valent. Hence, no matter what decide actions this common suffix contains, it will violate the 
valencies of at least one of a5, as, - 


CASE 2: participants(e, s') N participants (e’, s') = Sp. 


Subcase 2.1: At least one of action(e, s'), action(e’,s’) is not a perform action of Sz. Hence 
at least one of these is an invocation or a response. Now invocation and response actions do not 
change the value of the underlying variable of S;. 


Since both these actions are enabled in s’, it follows that the enablement of neither action 
depends on the prior execution of the other action (this might be the case for certain invocation, 
perform or perform, response pairs of actions, but not here). Hence, from Figure 1, we see that 
these actions commute, in that their order can be reveresed and the same final global state wil 
result. Hence, e’ is applicable to so, and e’(s9) = s1;. Hence, e’(as,) and as, have at least one 
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infinite fair extension with a common suffix. Since a, does not contain any decide actions, it 
follows that the suffix must contain decide actions. Now a, is 0-valent and a;, is 1-valent. Hence, 
no matter what decide actions this common suffix contains, it will violate the valencies of at least 
one of 5,, Qs, - 


Subcase 2.2: Both of action(e,s’), action(e’,s’) are perform actions of Sp. Since ag is 
bivalent, then, under the assumption that P solves f-fault-tolerant consensus, a, cannot contain 
any decide actions, since that would violate agreement. Hence, as, does not contain any decide 
actions either, since action(e, s’) is not a decide. 


Let a” be an infinite fair execution that extends a,,, and let a’ be the suffix of a’ starting in 
state s’. Furthermore, let a’ be chosen such that: 


1. The first f actions along a’ are fail; actions for f different 7 € Jy 


2. For every ocurrence of an action a along a’, and every i € J, if task(a) = st, 4, then a = 
dummy, ;. That is, whenever st,,; is scheduled along a, the dummy, action is chosen. Since 
dummy, ; is enabled at all states of a’ except the first, it is certainly possible to always choose 
to schedule the dummy, ; action in this way, along a’. 


Since P is f-tolerant, f > 1, a decide(v); action, for every nonfaulty process must occur along 
a’. Let al, be the prefix of a’ ending in the state just after the first such decide(v); action. Let 
o = schedule(al,). From a, derive the schedule o’ by removing: 


1. Every occurrence of a fail;, and 


2. Every occurrence of st, for all i € I (these all correspond to dummy,,; actions in a/,), 


It is clear that o’ is a failure-free schedule. Since, in a, the transitions corresponding to the above 
task occurrences do not induce any change of state other than to S;, which is silent, it follows that 
a’ is applicable to so, and that o’/(as,) contains a single decide action. 


By the case condition, s9 and s; differ only in the state of S,;. Since processes and channels are 
deterministic, and since services have a deterministic type and also behave as given by Figure 1, 
we can see that o’ is applicable to s1, and that o’(a,,) is the same as o’(as,), with the exception of 
the local state of S,. In particular, o’(as,) and o’(as,) contain the same action subsequence. So, 
o'(as,) and o’(a@s,) contain the same single decide(v); action, for some v € {0,1}, Choosing v = 0 
contradicts the 1-valency of s1;, and choosing v = 1 contradicts the 0-valency of so. 


CASE 3: participants (e, s’) M participants (e’,s’) = Ci,;. Since P; and C;,; are deterministic, 
and e  e’, it follows that one of action(e, s’), action(e’,s'), is a send(m);,;, and the other is 
a receive(m’);;, for some m,m’ € M. Since these are both enabled in s’, it follows from the 
definition of a FIFO channel (see [Lyn96, chapter 14]) that transition(e,s') and transition(e’, s’) 
commute. The remainder of the argument is similar to Case 2.1. 


CASE 4: participants(e, s') N participants (e', s’) = P;. 
Since a, is bivalent, then, under the assumption that P solves f-fault-tolerant consensus, a,’ 
cannot contain any decide() actions, since that would violate agreement. 


Let a” be an infinite fair execution that extends a,, and let a’ be the suffix of a’ starting in 
state s’. Furthermore, let a’ be chosen such that: 
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1. The action along a’ that starts in s’ is fail;, and 
2. No fail; actions, 7 #7, occur along a’, and 


3. For every action a, and every occurrence of a along a’, if task(a) = st, for some k € K, then 
a = dummy;;. That is, whenever st,,; is scheduled along a, the dummy; ; action is chosen. 
Since dummy,,; is enabled at all states, except the first, of any execution fragment that starts 
with fail;, it is certainly possible to always choose to schedule the dummy, ; action in this 
way, along a’. 


Since P is f-tolerant, f > 1, a decide(v); action, for every 7 # i must occur along a’. Let a’, be 
the prefix of a’ ending in the state just after the first such decide(v); action. Let o = schedule(a’,). 
From a, derive the schedule o’ by removing: 


1. The single occurrence of fadl;, and 
2. Every occurrence of st,; for all k € K, (these all correspond to dummy, ; actions in a), and 


3. Every occurrence of ct;;, for all 7 e 1, 7 #4 


Since the only fail action along o is fail;, it is clear that o’ is a failure-free schedule. Since, in 
a, the transitions corresponding to the above task occurrences do not induce any change of state 
other than to P,, which has failed, it follows that o’ is applicable to s’, and that o’(a,) contains a 
single decide action. We now establish Claims 4.1 and 4.2. 


Claim 4.1: 


1. a’ is applicable to so. 


2. Let y be the suffix of o’(a,’) starting in s’, and let yo be the suffix of o’(a,,) starting in so. 
Then y, Yo contain the same decide actions. 


We establish the claim by case analysis on the possibilities for action(e,s’). From the case 4 
condition, we have that P; € participants(e,s’). This restricts the possibilities for action(e, s’) to 
the following. 


Subcase 4.1.1: action(e,s') = aj, a € Ty-invs. By definition, o’ contains no occurrence of 
sty,;. Hence, y contains no action in st; ;. Let yoo be the same as y except that, for corresponding 
states along Yoo, the invocation buffer of S; contains additionally the invocation (i,a). Since yoo 
contains no action in st, ;, this extra invocation is never processed (by a perform() action) along 
yoo. Hence, the state-action-state triples along yoo are actual transitions of P (i.e., elements of 
steps(P)). Thus, yoo is an actual execution fragment of G(P). Furthermore, the first state of yoo 
is 89, and schedule(yo9) = o’. Hence o’ is applicable to sp. Now Yoo is the suffix of o’(s9) starting 
in sg. Also, y and yoo contain the same subsequence of actions, and so in particular contain the 
same decide actions. Letting yo = yoo establishes the claim in this case. 


Subcase 4.1.2: action(e, s') = by, b € Th-resps. By definition, o’ contains no occurrence of 
pt; nor of st,;. Let 7 be the suffix of o’(as’) starting in s’. Hence, y contains no action in pt; nor in 
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sty; Let Yoo be the same as y except that, for corresponding states along Yoo, the response buffer 
of S;, is missing the response (i, 0), and the state of P; is the result of executing input action by; in 
state s’. 


We now argue that every state-action-state triple along 79 is in steps(P), i.e, is an actual tran- 
sition of P. Since yoo contains no actions in pt;, this difference in P;’s local state does not cause 
any state-action-state triple along yoo to not be a transition of P, since no action along yoo either 
depends on (for enablement) nor changes P;,’s local state. Likewise, since yo9 contains no actions 
in st, ;, then the difference in the response buffer of S,; cannot cause any state-action-state triple 
along yoo to not be a transition of P, since no action along yo9 either depends on (for enablement) 
those elements of S;,’s response buffer of the form (7,b), nor does any such action add or remove 
elements of the form (7,5) to S,’s response buffer. Thus, yoo is an actual execution fragment of 
G(P). Furthermore, the first state of yo9 is 89, and schedule(yo9) = 0’. Hence o’ is applicable to 
so. Now Yoo is the suffix of o’(so) starting in sp. Also, y and yoo contain the same subsequence 
of actions, and so in particular contain the same decide actions. Letting yo = yoo establishes the 
claim in this case. 


Subcase 4.1.3: action(e, s') = send(m);,;, m € M. By definition, o’ contains no occurrence 
of pt;. Let y be the suffix of o’(a,,) starting in s’. Hence, 7 contains no action in pt;. Also, message 
m is not received by P; along y, since it was not sent. (Wlog, we assume that all messages are 
tagged with unique identifiers. This is for the purpose of the proof only, and is not a restriction 
on the assumed system P.) Let yoo be the same as y except that, for corresponding states along 
Yoo, Ci,; contains in addition message m at its end (i.e., m is the “last” message in C;,;, recall that 
channels are FIFO), and the state of P; is the result of executing output action send(m);,; in state 
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We now argue that every state-action-state triple along 79 is in steps(P), i.e, is an actual tran- 
sition of P. Since yoo contains no actions in pt;, this difference in P;’s local state does not cause 
any state-action-state triple along yoo to not be a transition of P, since no action along yoq either 
depends on (for enablement) nor changes P;’s local state. Likewise, the difference in the contents of 
Cj; cannot cause any state-action-state triple along yo9 to not be a transition of P. The only triples 
that could possibly be affected are those whose action is receive(m’);,; for some m’ € M. But all 
such triples will correspond to the reception of the message m’ actually at the head of C;,; (in the 
initial global state of the triple), since the only difference in the contents of Cj; is that an extra 
message has been appended at the rear of C;,;. In other words, C;,; delivers the same sequence of 
messages along yoo that it does along y. Hence, all these triples will be actual transitions of P. 
Thus, Yoo is an actual execution fragment of G(P). Furthermore, the first state of yoo is so, and 
schedule(yo9) = 0’. Hence o’ is applicable to 59. Now Yoo is the suffix of o'(s9) starting in so. Also, 
y and yoo contain the same subsequence of actions, and so in particular contain the same decide 
actions. Letting yo = yoo establishes the claim in this case. 


Subcase 4.1.4: action(e, s’) = receive(m);;, m € M. By definition, o’ contains no occurrence 
of pt; nor of ct;;. Let y be the suffix of o’(a,’) starting in s’. Hence, y contains no action in pt; 
nor in ct;;. Let yoo be the same as y except that, for corresponding states along yoo, C;,; is missing 
the message m at its head, and the state of P; is the result of executing input action receive(m) ;; 
in state s’. 
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We now argue that every state-action-state triple along yoo is in steps(P), i.e, is an actual tran- 
sition of P. Since yoo contains no actions in pt;, this difference in P;’s local state does not cause 
any state-action-state triple along yoo to not be a transition of P, since no action along yoo either 
depends on (for enablement) nor changes P,’s local state. Likewise, the difference in the contents 
of Cj, cannot cause any state-action-state triple along yoo to not be a transition of P, since Yoo 
contains no action in ct;;. Thus, Yoo is an actual execution fragment of G(P). Furthermore, the 
first state of yoo is so, and schedule(yo9) = 0’. Hence o’ is applicable to s9. Now Yoo is the suffix of 
o'(so) starting in sp. Also, y and yoo contain the same subsequence of actions, and so in particular 
contain the same decide actions. Letting yo = Yoo establishes the claim in this case. 


Subcase 4.1.5: action(e,s’) = decide(v); or action(e,s') is an internal action of P;. By 
definition, o’ contains no occurrence of pt;. Let y be the suffix of o’(a,:) starting in s’. Hence, y 
contains no action in pt;. Let yoo be the same as ¥ except that, for corresponding states along yoo, 
Cj, and the state of P; is the result of executing action(e, s’). 


We now argue that every state-action-state triple along 79 is in steps(P), i.e, is an actual tran- 
sition of P. Since yoo contains no actions in pt;, this difference in P;’s local state does not cause 
any state-action-state triple along yoo to not be a transition of P, since no action along yoo either 
depends on (for enablement) nor changes P,’s local state. Thus, yoo is an actual execution fragment 
of G(P). Furthermore, the first state of yo9 is so, and schedule(yo9) = 0’. Hence o’ is applicable 
to so. Now 70 is the suffix of o’(so) starting in sp. Also, y and yoo contain the same subsequence 
of actions, and so in particular contain the same decide actions. Letting yo = yoo establishes the 
claim in this case. 


From our definition of distributed system with services, we see that the above are all the possible 
cases for action(e, s’). Having established Claim 4.1 in each case, we conclude that it holds generally. 
(end proof of Claim 4.1) 


Claim 4.2: 


1. o’ is applicable to 81. 


2. Let 7 be the suffix of o’(a,’) starting in s’, and let 7 be the suffix of o’(a,,) starting in 5}. 
Then y, yi contain the same decide actions. 


From the case 4 condition, we have that P; € participants(e’,s'). Hence, we can apply exactly the 
same argument as used in the proof of Claim 1 to conclude that: 


1. ao’ is applicable to s”. 


2. Let 7” be the suffix of (a5) starting in s”. Then y, y” contain the same decide actions. 


From the case 4 condition, we have that P; € participants(e, s’). Hence, e = pt;, or e = ct;,, 
or e = sty 4, with action(e,s') = by; for some b € Ty.resps. If e = pt; or e = ct;;, then clearly 
P; € participants(e,s”). Ife = st, 4, with action(e, s') = by, for some b € Ty.resps, then, by well- 
formedness of P; w.r.t. S,, and P; € participants(e’,s’), it follows that action(e’,s’) # ax, for all 
a. From e # e’ it follows that action(e’, s') A by; for all b € Ty.resps, since otherwise we would 
have e' = e = st,,. Hence, from P; € participants(e’, s’), we conclude S,; ¢ participants (e’, s’). 
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Hence, the local state of 5; is the same in s’ and 8”, ie., 3/5), = 81S; Since action(e, s’) = by 3, 
we know that in state s’, (¢,b) is in the response buffer of S;. Hence, we conclude that in state 
s", (6,4) is in the response buffer of S,. Thus, by well-formedness of P; w.r.t. S;, in state s”, the 
invocation buffer of S, contains no invocation (7, a), for any a € Ty.invs. Now s” lies along a fault- 
free execution. Hence, dummy,,; is not enabled in s”. Hence, in state s”, the only action of task 
sty; that is enabled is by; (see Figure 1). Hence action(e, s”) = by. Hence P; € participants(e, s”’). 


Thus, for all possible cases of e, we have established P; € participants(e,s"). Hence, from (1) 
a’ is applicable to s”, and (2) y, 7” contain the same decide actions, which we showed above, we 
can apply exactly the same argument as used in the proof of Claim 1 to establish Claim 4.2. 
(end proof of Claim 4.2) 


Since o’ is a failure-free schedule, and a,, is a finite failure-free execution, we conclude that 
o'(as,) is a finite failure-free execution. Since so is 0-valent, it follows that o’(as,) contains at least 
one decide(0); action, for some j € I. 


Since o’ is a failure-free schedule, and as, is a finite failure-free execution, we conclude that 
o'(as,) is a finite failure-free execution. Since s1 is 1-valent, it follows that o’(a,,) contains at least 
one decide(1), action, for some j’ € J. 


Let 7 be the suffix of o’(a,:), yo be the suffix of o’(as,), and y be the suffix of o’(as,). 
From Claims 4.1 and 4.2, we have that y, yo, and 7; all contain the same decide actions. By its 
construction, y contains a single decide action. Hence, yo, 7; contain a single decide(v), action in 
common, for some v € {0,1}, @€ I. Choosing v = 0 contradicts the 1-valency of s;, and choosing 
v = 1 contradicts the 0-valency of s9. Hence, we have derived the desired contradiction. 

(end of CASE 4) 


Since we have established a contradiction in all of CASES 1-4, the lemma holds. 


Lemma 6 Let a, be a finite input-first failure-free bivalent execution of G(P), and let last(a,) = s. 
Let e be a task of P applicable to a,. Let 
U = {ay | A =o(as), o is a finite failure-free schedule applicable to a; and not containing e}, 
V = {e(ax) | a € U and e is applicable to a,}. 
Then V contains a bivalent execution. 


Proof. In the statement of Lemma 4, a, is a finite failure-free execution and ga is a finite failure-free 
schedule. Hence, condition (2) of Lemma 4 is the existence of a hook in G(P) whose stem is a finite 
input-first failure-free execution. By Lemma 5, we know that (2) cannot hold. Thus, the desired 
result follows immediately from Lemma 4. 


We now present the proof of Theorem 1: 


Assume that P is such a distributed system with services. Using Lemma 6, we construct an 
infinite execution y of P in which no decide action occurs. By Lemma 3, P must have a bivalent 
initialization. Call it yo. We now apply Lemma 6 to extend yo repeatedly. 


Fix an arbitrary round-robin order of all the tasks in P, except for the init(v); and fail; tasks. 
Let y; be the current execution, and let t; be the next task in the round robin order. Assume 
inductively that 7; is bivalent. (yo gives the base case). 


If t; is not applicable to last(y;), then move on to the next task in the round robin order, etc. 
until an applicable task is found. Since the process tasks are always applicable, we are guaranteed 
to find an applicable task. So, without loss of generality, let t; be this task. 
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By Lemma 6, there is a bivalent extension 7,4; of yj; such that the last action along y;+1 is in 
task e. 


Let 7 be the unique execution such that for alli > 0, 4; is a prefix of y. If a task t is continuously 
enabled, then, when it is selected in the round robin order, it will be found applicable to the last 
state of the current execution. Hence, the extension will contain an action from ¢. Along y, this 
will happen infinitely often. Hence, y satisfies the I/O automaton weak fairness condition. Since y 
has infinitely many prefixes y;, 2 > 0, that are executions of P, it thus follows that y is an execution 
of P. Since none of the 7; contain a decide action, it follows that y does not either. 


5 k-set consensus 


We now show that when the system is solving a problem that is weaker than consensus, namely 
k-consensus (section 2.2), it is possible to boost the fault-tolerance level. Assume we have available 
f-fault-tolerant k-consensus services, each one with m ports. An f’-fault-tolerant algorithm that 
solves k'-consensus is as follows. Take a principal subset of the processes, and divide it into s 
disjoint groups, each one accessing a different service. Each principal process participates in an 
execution proposing its input value to its designated service. If and when it gets a decision back, it 
sends the decision to all the other processes in the entire set of processes (not just those involved 
in the same consensus service). Meanwhile, each principal process collects all the results it receives 
from all processes, and decides on any of these results. The remaining processes simply wait for 
a result from one of the principal processes. The values of k’ and f’ depend on the size of the 
principal set, and on the number s of services we divide it into. There is a tradeoff between k’ and 
f': ifa small number of failures f’ is tolerated, then a high degree of agreement is achieved, namely 
a small k’. If more failures f’ must be tolerated, then a lower degree of agreement is achieved, 
namely a large k’. 


To prove correctness, we divide the principal processes appropriately into the services they 
access. We must ensure that less than s -(f +1) principal process can fail, i.e., f’ < s-(f +1), to 
guarantee that at least one service S has at most f failures. Service S is therefore not killed, and 
moreover, S has at least one nonfaulty participant, who succeeds in sending the value to everyone. 
That means that every nonfaulty process decides. The value of k’, i.e., the number of possible 
different decision values is at most s-k: there are at most k different values returned per service; 
more precisely, at most k values per service being accessed by at least k processes, and c values for a 
service that is being accessed by c processes for c < k. Thus, for a desired overall fault-tolerance f", 
we want the smallest possible k’ and so we find the smallest integer s that guarantees f’ < s-(f+1). 
Thus we use s = [(f’ + 1)/(f + 1)] services, and take the first f’ + 1 processes to be the principal 
processes (f’ + 1 processes using as few services as possible, each one with f + 1 input ports). It 
follows that 


Theorem 7 For anyl<k<m,k<f<m-—1,1< f’<n-1, tt is possible to solve f'-tolerant 
k'-consensus for an endpoint set of n processes using f -tolerant k-consensus services, each one with 
m ports, for 


When each service is completely reliable, that is f = m-— 1, and we divide the processes as 
described above, this algorithm reduces to the one of [HRO00], and gives an upper bound proved to 
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be tight using topology. As an example, we want to build an f’ = 2c—1-fault-tolerant algorithm for 
an endpoint set containing at least 2c processes, and using only 1-fault-tolerant consensus services, 
ie., f =1,k =1. The smallest k’ for which we can do this is k’ = c, using s = c services, each 
with 2 processes (f’ + 1 = 2c principal processes). 


6 Further Work and Conclusions 


We studied the consensus problem in an asynchronous distributed system with stopping failures, 
and where processes can access services that abstract oracles such as hardware primitives or failure 
detectors. Many papers have studied a similar model, but to our knowledge this is the first time 
services that are implemented by the processes in the system are considered. We showed that f- 
tolerant consensus is not achievable using less fault-tolerant consensus services as building blocks, 
but that k-consensus can be solved with less fault-tolerant k’-consensus services as building blocks. 


Our algorithm for k-consensus generalizes that of [HR94, HR00] for reliable services. That 
algorithm achieves a tight upper bound. It is an open question what is the exact situation for 
k-set consensus in our model: for which k, k’, f, f’ is it possible to construct a k-consensus service 
tolerating f failures from k’-consensus services tolerating f’ failures each? This seem to lead to 
more general hierarchy results, in the style of Herlihy’s universality result [Her91], the consensus 
wait-free hierarchy [Jay97], and the set-consensus hierarchy e.g. [BG93], all of these for services 
that can fail in our sense. 
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A Technical Background 


Definition 1 (I/O Automaton) An I/O automaton A consists of five components: 


1. A set of states states(A). 


2. A nonempty set start(A) C states(A) of start states. 


SS) 


. A signature sig(A) = (in(A), out(A),int(A)) where in(A), out(A), and int(A) are disjoint 
sets of input, output, and internal actions, respectively. Denote by local(A) the set out(A) U 
int(A) and by acts(A) the set in(A) U out(A) Uint(A). 


4. A task partition tasks(A), which is a partition of local(A) into at most a countable number 
of classes. 


5. A transition relation steps(A) C states(A) x acts(A) x states(A) 


Let s,s’,u,u,,... Tange over states and a,b,... range over actions. We say that a is enabled in 
state s iff there exists state s’ such that (s,a,s’) € steps(A). If t is a task and some action a € t is 
enabled in state s, then we say that task ¢ is enabled in state s. 


An execution fragment of A is an alternating sequence of states and actions 594151... $;_1@j5;... 
such that for all 4 >, (s;-1a;5;) € steps(A), i-e., the sequence conforms to the transition relation of 
A. An execution of A is an execution fragment that begins with a state in start(A). 


If a is a finite execution or execution fragment, then first(a) denotes the first state of a, and 
last(a) denotes the last state of a. If a is a finite execution or execution fragment, a’ is an execution 
fragment, and last(a) = first(a’), then a~a’ denotes the concatenation of a and a’. 
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